282 ◾ Bioinformatics
--i-reference-sequences inputs/97_otus-GG_db.qza \
--p-perc-identity 0.97 \
--o-clustered-table closed_ref_cl_97/table-yoga-closed_cl.qza \
--o-clustered-sequences closed_ref_cl_97/rep-seqs-yoga-close_
cl.qza \
--o-unmatched-sequences closed_ref_cl_97/
unmatched-yoga-close_cl.qza
The above script outputs three artifacts: A feature table, clustered sequences (the sequences
defining the features in the feature table), and unmatched sequences (the sequences that
didn’t match reference sequences at 97% identity). The unmatched sequences will be com-
pletely ignored.
7.3.4.2.1.3.3 Open-Reference Clustering
The open-reference clustering is hybrid of the above two clustering methods. First, it uses
reference sequences for clustering the matched sequences and then it performs de novo
clustering on the unmatched sequences. The open-reference clustering is performed with
“cluster-features-open-reference” method. The input and output artifacts are the same
as that of the closed-reference clustering except that there are no unmatched sequences;
instead, there is an artifact for the new reference sequences used as an input in addition to
the sequences clustered as part of the internal de novo clustering step. We will create the
new subdirectory “open_ref_cl_97” for files of the open-reference clustering.
mkdir open_ref_cl_97
qiime vsearch cluster-features-open-reference \
--i-table inputs/derep-yoga-table.qza \
--i-sequences inputs/derep-yoga-seqs.qza \
--i-reference-sequences inputs/97_otus-GG_db.qza \
--p-perc-identity 0.97 \
--o-clustered-table open_ref_cl_97/table-yoga-open_cl.qza.qza \
--o-clustered-sequences open_ref_cl_97/rep-seqs-yoga-open_cl.qza \
--o-new-reference-sequences open_ref_cl_97/
new-ref-seqs-open_cl.qza
The three clustering methods use dereplicated feature table and representative sequences
and produce a final feature table and OTU representative sequences to be used in the
downstream analysis for phylogeny, diversity analysis, assignment of taxonomic group,
and differential taxonomic analysis.
7.3.4.2.2 Denoising
Like clustering, denoising also produces a feature table and representative sequences.
However, denoising attempts to remove errors and to provide more accurate results.
There are two denoising methods available in QIIME2: DADA2 and deblur. Both meth-
ods output feature tables containing feature abundances and ASVs. Moreover, they also